pac-bayes analysis
A Limitation of the PAC-Bayes Framework
PAC-Bayes is a useful framework for deriving generalization bounds which was introduced by McAllester ('98). This framework has the flexibility of deriving distribution-and algorithm-dependent bounds, which are often tighter than VC-related uniform convergence bounds. In this manuscript we present a limitation for the PAC-Bayes framework. We demonstrate an easy learning task which is not amenable to a PAC-Bayes analysis. Specifically, we consider the task of linear classification in 1D; it is well-known that this task is learnable using just $O(\log(1/\delta)/\epsilon)$ examples. On the other hand, we show that this fact can not be proved using a PAC-Bayes analysis: for any algorithm that learns 1-dimensional linear classifiers there exists a (realizable) distribution for which the PAC-Bayes bound is arbitrarily large.
PAC-Bayes Analysis Beyond the Usual Bounds
We focus on a stochastic learning model where the learner observes a finite set of training examples and the output of the learning process is a data-dependent distribution over a space of hypotheses. The learned data-dependent distribution is then used to make randomized predictions, and the high-level theme addressed here is guaranteeing the quality of predictions on examples that were not seen during training, i.e. generalization. In this setting the unknown quantity of interest is the expected risk of the data-dependent randomized predictor, for which upper bounds can be derived via a PAC-Bayes analysis, leading to PAC-Bayes bounds. Specifically, we present a basic PAC-Bayes inequality for stochastic kernels, from which one may derive extensions of various known PAC-Bayes bounds as well as novel bounds. We clarify the role of the requirements of fixed'data-free' priors, bounded losses, and i.i.d.
A PAC-Bayes Analysis of Adversarial Robustness
We propose the first general PAC-Bayesian generalization bounds for adversarial robustness, that estimate, at test time, how much a model will be invariant to imperceptible perturbations in the input. Instead of deriving a worst-case analysis of the risk of a hypothesis over all the possible perturbations, we leverage the PAC-Bayesian framework to bound the averaged risk on the perturbations for majority votes (over the whole class of hypotheses). Our theoretically founded analysis has the advantage to provide general bounds (i) that are valid for any kind of attacks (i.e., the adversarial attacks), (ii) that are tight thanks to the PAC-Bayesian framework, (iii) that can be directly minimized during the learning phase to obtain a robust model on different attacks at test time.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- North America > Canada > Alberta (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
A Limitation of the PAC-Bayes Framework
PAC-Bayes is a useful framework for deriving generalization bounds which was introduced by McAllester ('98). This framework has the flexibility of deriving distribution- and algorithm-dependent bounds, which are often tighter than VC-related uniform convergence bounds. In this manuscript we present a limitation for the PAC-Bayes framework. We demonstrate an easy learning task which is not amenable to a PAC-Bayes analysis. Specifically, we consider the task of linear classification in 1D; it is well-known that this task is learnable using just O(\log(1/\delta)/\epsilon) examples.
A PAC-Bayes Analysis of Adversarial Robustness
We propose the first general PAC-Bayesian generalization bounds for adversarial robustness, that estimate, at test time, how much a model will be invariant to imperceptible perturbations in the input. Instead of deriving a worst-case analysis of the risk of a hypothesis over all the possible perturbations, we leverage the PAC-Bayesian framework to bound the averaged risk on the perturbations for majority votes (over the whole class of hypotheses). Our theoretically founded analysis has the advantage to provide general bounds (i) that are valid for any kind of attacks (i.e., the adversarial attacks), (ii) that are tight thanks to the PAC-Bayesian framework, (iii) that can be directly minimized during the learning phase to obtain a robust model on different attacks at test time.
PAC-Bayes Analysis Beyond the Usual Bounds
We focus on a stochastic learning model where the learner observes a finite set of training examples and the output of the learning process is a data-dependent distribution over a space of hypotheses. The learned data-dependent distribution is then used to make randomized predictions, and the high-level theme addressed here is guaranteeing the quality of predictions on examples that were not seen during training, i.e. generalization. In this setting the unknown quantity of interest is the expected risk of the data-dependent randomized predictor, for which upper bounds can be derived via a PAC-Bayes analysis, leading to PAC-Bayes bounds. Specifically, we present a basic PAC-Bayes inequality for stochastic kernels, from which one may derive extensions of various known PAC-Bayes bounds as well as novel bounds. We clarify the role of the requirements of fixed'data-free' priors, bounded losses, and i.i.d.
PAC-Bayes Analysis Beyond the Usual Bounds
Rivasplata, Omar, Kuzborskij, Ilja, Szepesvari, Csaba, Shawe-Taylor, John
We focus on a stochastic learning model where the learner observes a finite set of training examples and the output of the learning process is a data-dependent distribution over a space of hypotheses. The learned data-dependent distribution is then used to make randomized predictions, and the high-level theme addressed here is guaranteeing the quality of predictions on examples that were not seen during training, i.e. generalization. In this setting the unknown quantity of interest is the expected risk of the data-dependent randomized predictor, for which upper bounds can be derived via a PAC-Bayes analysis, leading to PAC-Bayes bounds. Specifically, we present a basic PAC-Bayes inequality for stochastic kernels, from which one may derive extensions of various known PAC-Bayes bounds as well as novel bounds. We clarify the role of the requirements of fixed 'data-free' priors, bounded losses, and i.i.d. data. We highlight that those requirements were used to upper-bound an exponential moment term, while the basic PAC-Bayes theorem remains valid without those restrictions. We present three bounds that illustrate the use of data-dependent priors, including one for the unbounded square loss.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- North America > Canada > Alberta (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
A Limitation of the PAC-Bayes Framework
The classical setting of supervised binary classification considers learning algorithms that receive (binary) labelled examples and are required to output a predictor or a classifier that predicts the label of new and unseen examples. Within this setting, Probably Approximately Correct (PAC) generalization bounds quantify the success of an algorithm to approximately predict with high probability. The PAC-Bayes framework, introduced in [24, 37] and further developed in [23, 22, 33], provides PACflavored bounds to Bayesian algorithms that produce Gibbs-classifiers (also called stochastic-classifiers). These are classifiers that, instead of outputting a single classifier, output a probability distribution over the family of classifiers. Their performance is measured by the expected success of prediction where expectation is taken with respect to both sampled data and sampled classifier. A PAC-Bayes generalization bound relates the generalization error of the algorithm to a KL distance between the stochastic output classifier and some prior distribution P. In more detail, the generalization bound is comprised of two terms: first, the empirical error of the output Gibbs-classifier, and second, the KL distance between the output Gibbs classifier and some arbitrary (but sample-independent) prior distribution. This standard bound captures a basic intuition that a good learner needs to balance between bias, manifested in the form of a prior, and fitting the data, which is measured by the empirical loss. A natural task is then, to try and characterize the potential as well as limitations of such Gibbs-learners that are amenable to PAC-Bayes analysis. As far as the potential, several past results established the strength and utility of this framework (e.g.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Arizona > Maricopa County > Phoenix (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
PAC-Bayes Analysis of Sentence Representation
Learning sentence vectors from an unlabeled corpus has attracted attention because such vectors can represent sentences in a lower dimensional and continuous space. Simple heuristics using pre-trained word vectors are widely applied to machine learning tasks. However, they are not well understood from a theoretical perspective. We analyze learning sentence vectors from a transfer learning perspective by using a PAC-Bayes bound that enables us to understand existing heuristics. We show that simple heuristics such as averaging and inverse document frequency weighted averaging are derived by our formulation. Moreover, we propose novel sentence vector learning algorithms on the basis of our PAC-Bayes analysis.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Europe > France (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)